Deep Learning Nyu/Week 3

Can make the parameter vector the output of a function.
- eg. weight sharing with a Y connector (tie values of W together)
  - weights are forced to be equal; basis of a lot of ideas
  - gradients are summed up while backpropagating
- Hypernetwork: weights of one network are computed as the outputs of another network. (will come back in a few weeks)
Detect motif anywhere on an input
- detect if there's a speech signal
- can have a detector that can slide over the input – that all share the same weight
- output goes to a max function
- similarly can swipe a template over the image to detect motifs
Shift invariance – output unchanged with shift in input
Shift equivariance – output changes corresponding to shift in input
Convolution y_i = sum over j w_j * x_(i-j)
- Index goes backwards in the window, goes forward in the weights
- 2d y_(ij) = sum over kl w_kl x_(i+k,j+l)
Cross correlation
- Index and weights go forward together
w is the convolution kernel
- also called a filter
stride move window forward by > 1 step
generally solved by padding the windows
Inspired by biology
- brain recognizes objects in 100ms
- can have very specialized: invariant to irrelevant transformations
- simple cells detect local features
- complex cells pool outputs of simple cells
- complex cell relies on a combination of all the subcells
Architecture
- Filter banks / non linearity / pooling
- Modern arch
  - Normalization
  - Filter bank
  - Non linearity
  - Feature pooling (generally max pooling)
    - max, pth root of sum of pth powers; probability pooling
    - basically functions that return the same value irrespective of position
  - (Repeat above)
  - Classifier
Fully connected layers
- can be viewed as 1 by 1 convolutions
LATER Read LeCun 1998 and implement it with Pytorch; implement it with my own code
- github.com/activatedgeek/LeNet-5
Multiple character recognition
- Swipe convnet over the input: shifting it over the input
- Allows giving characters
- That becomes extremely wasteful
- Instead
  - Take a large input and keep convolving
  - Get multiple outputs with a convolutional layer
  - much more cheaper than recomputing at every location
- Another approach was finding the character at the middle of the convnet
- Convnets are good for
  - shift invariant, distortion invariant
  - sizes of the objects change a lot
    - be able to detect smaller objects by training on constantly smaller sizes on the same convnet
  - multi dim array signals
  - strong local correlations between values
    - less similar with distance
    - features can appear anywhere is why we can have shared weights
  - FC net doesn't care about permutations
- Practicum
  - signals can be represented as vectors – waveform
  - words are one hot vectors – language has those kinds of properties
  - Receptive field = how many neurons I see with the previous layer
  - sparsity only because data shows locality
  - stationary – things appear again and again => can share parameters
  - parameter sharing leads to
    - faster convergence
    - better generalization
    - not constrained on the input size (can keep shifting)
    - kernel independence => high parallelization
  - kernels => 1d data
  - 1d data uses 3 kernels
  - odd sized kernels so that it's evenly distributed on both sides
  - Standard spatial cnn
    - multiple layers of
      - conv
      - non linearities
      - pooling
      - batch normalization
    - Residual bypass connection
    - logit at the end for classification
  - Geoff hinton – capsule maps